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ABSTRACT 

Rooted binary trees with weighted nodes are 
structures encountered in many areas, such as coding theory, 
searching and sorting, information storage and retrieval. The path 
length is a meaningful quantity which gives indications about the 
expected time of a search or the length of a code, for example. In 
this paper, two sharp bounds for the total path length of general 
weighted node trees are derived. (Author) 
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Rooted binary trees with weighted nodes are structures encountered in many 
areas, such as coding theory, searching and sorting, information storage and 
retrieval. 

A common quantity of great importance is the weighted path lengu An 
estimate of it gives indications about the expected time of a search or the length 
of a code, for example. The knowledge of lower and upper bounds would permit such 
estimates. 

• « 

The noiseless coding theorem in Information Hieory provides a lower 
bound for the weighted root-leaf path length. But, until recently, upper bounds 
which are the more meaninfgul for many applications, were still lacking. They 
were first obtained for unweighted trees, introducing the concept of structural 
balfuice of a tree [1]. Then upper bounds for various weighted trees have been 
derived using structural balance and the new concept of weight balance of a tree [2]. 

In this paper, using a different definition of the weight balance of a 
tree, we derive two upper bounds for the total path length of general weighted 
node trees. 

The first one introduces two parameters y and 0^ which sharpen the bound 
but complicate its expression. The parameters y will be useful every time we have 
a certain knowledge of the weight distribution. However, if the estimation of y 
or p requires too much computation, we can take them equal to zero and derive 
from this- upper bound a simpler one. 

The second upper bound, using a different normalization, will be useful 
when the entropy of the weights of the nodes is not known. 




A bina ry tree T is a finite Get of n nodes which is either eimty 
(if n 0) or else is partitioned into the following three classes: A single node 
r called the root of T . a binary tree T on the g nodes 1, . . , r-1 called the left 
subtree of the root and a binary tree T d on the d noles r+1, . n called the right 
subtree of the root (r « Qt-X, g + d + 1 = u). 

The subscripts £ and d will always refer lespectively to the left and 
right subtrees of T . In the above definition they have two meanings. They 
indicate both sub'trees and number of nodes. 

A weighted tinvry tree is a binary tree T R such that a non-negative real 

number w. , called a weight, is assigned to each node k of T . We denote a weighted 
n n 

binary tree of n nodes by the (n+l)-tuple (T n ; w^, . w^). 

W, Wg, W d will denote respectively the sum of the weights of T n , T , 

and T,. 

d • 

Weight distribution . We will restrict the weight distribution to the 
following case: At least one of the two sons of each non- terminal node must have 
a strictly positive weight. 

The weighted path length [t ! of a tree T n is defined as the sum over all 
the nodes of the product of the weight of the node and the level of the node. The 
weighted path length satisfies the following equalities: 

ItJ - ItJ + |Tj +W g + w d (n>l) 

Weigh '-c'l root halance p(T „ ). The two u; per bounds depend on a parameter 
which reasures tnc "balance" of a tree, in the sense of how close the total weights 
of the left and right subtrees of the root are to each other. The following 
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P(T ) - i if n = 1; otherwise, p(T ) = min( rr— , ) serves partially 

n " r 1 " r 

this purpose. This definition implies 0 < p (T ) < ^. According to the restriction 

ir.ade previously about the weight distribution, p (T ) = 0 whenever T R has a right 

or loft subtree of one node, the weight of which is equal to zero. (W g • W" d = 0). 

Wo will see later that such a value must be avoided because it would give a bad 

estimate oi Lie weighted balance of T . We notice that the weighted path length 

It I remain-, the same if the weights of the two sons of T are interchanged. Let 
" n > n 

v; denote the positive weight of the two sons of the root of T n when the condition 

W • W, = 0 arises, then the following definition: 
6 d 



p(T n ) = ^ if n = 1, otherwise 



if W • W d f. 0, then p(T n ) = minCgS- , — ) 



r r 



W r + W d - V v 
if W • W d . 0, then p(T n ) = mn(^j , ^— ) 



makes the weighted root balance strictly positive, except for the weighted binary 

trees of 2 nodes when one son of the root has a weight equal to zero. 

t The weighted balance &(tJ is equal to ^ if n = 1 or n = 3, otherwise 

p(T ) = min[P(T ), p(T r ), p(T,)]. Although the weighted root balance can be 

equal to i:ero for trees of three nodes, we notice that the weighted balance 

Ls(T ) for ii > 1 is always strictly positive. We deduce from this definition 
n — 

the following inequality. 

for n = 1 or n > J: o < p(T n ) < p(? n ) < | . 

Terminal wej.'hLcd balnncv P-XT )i 

H (T ) * P(T ) if n s 1 i-r 3, otherwise: 
*" T n n 



p. T (T n ) .mini yT,), •• T (T d )] 
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this definition implies that 0 < P T (T n ) < | . This parameter appears in the 
first uppo' bound. Its evaluation is easy and it sharpens the bound significantly. 
In particular, it makes the bound equal to the weighted path length for trees of 
one or throe nodes. However, in any case, if we don't want to estimate it we can 
take it equal to zero. 

iA.-IiiiUi.u i oJ y: This, parameter appears also in the first upper bound. 
It can be verified easily that its value does not matter for trees of one or 
three nodes. Moreover, the bound is equal to the path length for every value of 
7 . Therefore , the value of 7 needs only to be estimated when n > 3. 

if n = 3 7 (T 5 ) - 5(Tj 

otherwise 7 (T n ) = min(6(T n ), 7(T g ), 7(T d )) 

with B(T n ) = (a + 1) log(a + l) - a log a 

w - w r 

and where a = 



v r 



The assumption made previously about the weight distribtution implies that for ' 

n r> 3> 7(T ) has always a finite value, even if v = 0. 
n * r 

The egression of the first upper-bound shows that the parameter 7 will 

sharpen stroncly the bound if its value is high. However, this parameter will be 

useful only whenever we have a sufficient knowledge of the weight distribution of 

T , because its estimation implies come computation. Nevertheless, 7 as well as 

p_, can in any case be taken equal to zero. This corresponds to the fact that if 

n > 1, then W - w > 0. 
— ' r - 

rni.roviy ;:(;•:). We will introduce the following quantity in the two 

bounds : 

H(x) = - fx lor, x + (l-x) log(l-x)] for 0 < x < 1. 



L or I»(T n ) will denote the sum of the weights of the terminal nodes of 



All the logarithms in this paper are taken to the base two. 



1. Ir.-5-v.-J '.tv tfso.i 

. ! 

Let [y.^, Xg, . . , x^} be a set of n non-negative real numbers such 

n 

that S = L x. > 1. Let x_ denote any one of these n nur.oers and a real 

i=l 

positive nur.ber such that 2 > (l ♦ O^)*)^ ^ e n, we have the following inequality: 
(1) (2-x.J leg (S-.-^) ^SlogS-^log^ - 5x k where 

6 = (o^ + 1) .Tog (a A + l) - log o^. 

Proof: Let f (x) = x log x and a and b* be two real numbers such that 1 < a, 
0 < b. Then we have: 

f(l + b) - f(l) < f(a + b) - f(a). 
S xv 

Applying this relation with a = " y « , b = — , we obtain the previous 

inequality (l). 

2. First * rper Hov.d 

Let (T n< . w^, .., v q ) be a weighted binary tree, then the weighted 
path length satisfies the following inequality: 

1^1 < hTbT K:: -V +( £ - V°s%) -r (w-v r )j 



+ (r + i - k(Rj,))l 



where p, « , >, L have the definitions given in (ll). 

Froef : Cr.se (i).. The weight distribution verifies the restriction introduced 
previous];.- and wc assise that if / C the;: w^ > 1 for all k. 

a) For n = 1, W = v% and L(T^) - C, the assertion is true. Wc can 

also vrl J t : 
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(2) | T | < J_ ( W log »*« 1 lo e i-. 7 »U(, u . „(!)) 

'jTij.fi inequality which holds fo? every value of 7 will be useful in the following 
pert of the proof. 

b) Assume that the assertion Is true for all i < n and let 

(T f .J w.;, .., w'), (? j w" w") bo the left and right subtrees of T . Hence 

w k \ for 1 < k < r-1 arid w£ == w k+r for 1 < k < n-r (g = r-1 and d = n-r). 

\>..i P T > 7J P., 3-, , 7 H he respectively the parameters of T and T. defined 
" g d 6 d 

in (il). Then, according to the relation |t|«|t|+|t.|+W + M J we write: 

n g d g d 

» 

4 Witf K- «J> **» 4 - »;> + 5 »i 108 ^r- -w; log J, . 7d (w 4 .v;) ] 

r 

♦ (r, ♦ 1 - n(p )) I g ♦ <7 d ♦ 1 - H((^)) L d ♦ w g ♦ w a 
Using the equality L = L g + L d and the three relations: 

p(T n ) . nln [p(T n ), p g , p d ], p T (T n ) =min[3 Tg , ftp ], y(T n ) - adn[6(^), 7g , r d ] 
defined in (il), we write: 

l T J S jfe, » 6 - »;") ioe(w g - »;) ♦ (w d - «j) io E (w d - »") ♦ ! w- ^ 

k=l k 

+ L w£ log ^ - 7 (W - w; + W d - wj)] + ( 7 + 1 - H(fi J)l + W + V . 

If wo assume that T and T have both more than one node (g > 1 and d > l) then 
W g - Wj. > 1 and W d - wj| > 1. We can apply the inequality (l) defined at the - 

bc-'nnirg of (ill), Moreover, using the relation 7 - min[s(T ), y(T ), 7(T )], 

^ S cL 

we obtain the pair of inequalities: 
9 



(3) 0- - »•) 2o t ; (v; - v') < Y; log W + w' log 4- - yV 

r 

(\: 0 - vj) lo; (w d - vj) < w d loc w fl * w; log - yv; 

r 

Wo no./ obtain tho following inequality: 



(4) ' T J S life) f-c W e + VI, 108 W 4 + £ w k lo g L. - w r lo G 1 



- y(W-v r )] + (y 1 - H(P T ))L + If + H d 



If, however, and are such that g = 1 or d = 1, we can't apply inequality 

(l) because W - v* or W. - w" is equal to zero. V.'e will use instead the 
g r d r 

expression (2) derived at the beginning of the proof when n = 1. Assume that 
T g and are such that g > 1 and d = 1, then 

where 3 d « 3 T * £ , 7 d having any value, therefore, after similar steps as 
d 

before, ve obtain: 

l«J < m f 6 - *p i°e(w - «;) + w d log w d + l K io 6 jt - »; io S ^ 

k=J- k r 

+ w^ lo- irr - 7(W g - w; - wj)] h (7 + 1 - H(^))L + W g + W d 

If nou wo apply the first o:* the pair of inequalities (2) we obtain the inequality 
(if) already derived. 

Hence for every left and rr^ht subtree, ve have: 

K - -«L K r 

+ (y •< 1 - H(^))L + W + W d . 
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The quurtity V/ lo t ; W + log can "be expressed in terms of the weighted root 
balance p ( r f ) 

w g log v; c + w d log w d 

«P (W < W d ) [loc(W + W d ) + log p ] + (1 - p )(W g + W d ) [log(l- p) + lo S (W g + W d ) ] 
*> (W-v ) log (W-v^ ) + (W-w )T plog p + (1 - p ) log (1 - p )] 

p log( j~) ■'<• (1 - p ) log is entropy of the weighted root balance of 1^. 

Then if we recall that 0 < p < p < |, ve have 0 < K(b) < H(p) < 1. 

1 n 1 1 

|T I < tTTTTT t w -w ) log(W-w ) + Z w. log -i - * log — - 7(W-w )] 
1 n 1 -11(B) r r' . . k " w. r r w /N r' J 

VK/ k=l k r 

< (7 + 1 - H(6 T ))Li- (l-§[$) (W-W r ) • 

llevce the induction hypothesis is verified. The weighted path length Jt^I 
satisfies the following inequality: 

|T J - k& r(U " U r ) l0S (W " V r ) + k \ \ l0p: w^ ~ w r log T m 7< w - w r > 3 
+ (7 4 1 - H(p T ))L 

Case (ii): In the general case, let w. = — where w . is the 
' k w . min 

min 

minimis; of all the positive weights. . Then > 1 for all k such that w^ / 0. 
The result obtained in case (i), applied to this new weight distribution, gives 
immediately the desired result. 

ftcrark: iixcept for particular distributions of the weights like a descen'U:£ one, 

for exan;\le ; it is difficult to obtain an upper bound which is both sharp and 

simple. IV v/e dou f t want to introduce the parameters 7 and ft^, v:c can set both 

of theL equal to ::cro. This loads us to the simpler abound: 

n 



|,j 'rJ 2 :1 ^(W-w r ) r * w k Jog £ - w p log i]U 



ksd k 
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3. Second Upper round 



n 1 

Although the entropy 2 w k log is a natural quantity, it may no 
always be knovm. For such cases, the following bound would be useful. 



|T nl 2 BUST [(W " W r ) los ^ " 8 <■-«,>] + a 

min r 



w min ic thc 1 ' lln - , --« of all thepoaitivo weights of T n - The proof is quite similar. 
We vould have tlie following substitutions in part (ill). 
Inequality Used; 

(1) (S - log (S - x^) < S log S - 2 

S > V ^ . 0 or ^ > 1 

Proof: case (i) 

(2) \\\ < [W log W - 2 W x ] + 2 W x 

(3) (W g - log (W g - v^) < W g log W g - 2 

(w d - w;) log (w d - wj) < w d log w d - 2 w; 

<0 |T n' - m D V l0G \ + W d x ?e M a - 2 < W e + w d )3 + 2L g+2 L d + Wg + w d 
case (ii) remains the same. 
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